Search results for "minimal absent word"
showing 3 items of 3 documents
Some Investigations on Similarity Measures Based on Absent Words
2019
In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common s…
Minimal Absent Words in Rooted and Unrooted Trees
2019
We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet \(\varSigma \) of cardinality \(\sigma \). We show that the set \(\text {MAW}(T)\) of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality \(O(n\sigma )\) (resp. \(O(n^{2}\sigma )\)), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time \(O(n+|\text {MAW}(T)|)\) (resp. \(O(n^{2}+|\text {MAW}(T)|)\) assuming an integer alphabet of size polynomial in n.
Bacteria classification using minimal absent words
2017
Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the com…